Appears in the 3e CoUoque International sur les grammaires d'Arbres Adjoints (TAG+3). 
Technical Report TALANA-RT-94-01, TALANA, Universite Paris 7, 1994. 

Constraining Lexical Selection Across Languages Using TAGs"* 



Dania Egedi 

Institute for Research in Cognitive Science 

University of Pennsylvania 

Philadelphia PA 19104-6228 

egedi@linc.cis.upenn.edu 



Martha Palmer 

Department of Computer Science 

University of Pennsylvania 

Philadelphia PA 19104-6389 

mpalmer@linc.cis.upenn.edu 



> 

O 

en 



in 

o 
o 



On 
I 



O 



Lexical selection in Machine Translation consists of several related components. Two that have received 
a lot of attention are lexical mapping from an underlying concept or lexical item, and choosing the cor- 
rect subcategorization frame based on argument structure. Because most MT applications are small or 
relatively domain specific, a third component of lexical selection is generally overlooked - distinguishing 
between lexical items that are closely related conceptually. While some MT systems have proposed using 
a 'world knowledge' module to decide which word is more appropriate based on various pragmatic or sty- 
listic constraints, we are interested in seeing how much we can accomplish using a combination of syntax 
and lexical semantics. By using separate ontologies for each language implemented in FB-LTAGs, we are 
able to elegantly model the more specific and language dependent syntactic and semantic distinctions nec- 
essary to further filter the choice of the lexical item. 



1 Introduction 

One of the primary tasks for machine translation is lexi- 
cal selection - selecting the target lexical item that most 
closely matches the source lexical item being trans- 
lated. For transfer based approaches such as Transtar 
[17] and Geta [15], each separate lexeme in the source 
language must be paired with a corresponding lexeme 
in the target language in a set of bilingual dictionar- 
ies. An alternative^ is the interlingua approach, such 
as Princitran [3] or Translator [8], in which the source 
verb is mapped to a canonical semantic representation 
which is shared by all target languages. The elements 
of the semantic representation select the lexical choice 
in each target language. 

There are several components of lexical selection. 
Two that have received a lot of attention are lexical 
mappings from an underlying concept or lexical item, 
and choosing the correct subcategorization frame based 
on argument structure. Because most MT applications 
are small or relatively domain specific, a third compo- 
nent of lexical selection is generally overlooked - distin- 
guishing between lexical items that are closely related 
conceptually. There can be many shades of distinction 
between the meaning of a lexical item in one language 
and its counterpart in another language [11]. These 
distinctions are sometimes critical to selecting the cor- 



*We would like to thank Aravind Joshi, Sadao Kurohashi, and 
Zhibiao Wu for their helpful input. 

These are the two ends of the spectrum, and many systems 
now take a hybrid approach. Since the purpose of this paper 
is to highlight a area of MT usually ignored, and to propose a 
non-theory specific solution, we will not give an overview of all 
types of MT systems. We do limit our initial comments to non- 
statistical MT methods, as we do not believe that our method 
would be useful to purely statistical systems. 



rect lexical item in the target language. The question 
then arises, in both transfer and interlingua based sys- 
tems, of how and where to capture these distinctions. 
While some MT systems have relegated this task to a 
'world knowledge' or 'pragmatics' module [2, 14], we 
are interested in seeing how much we can accomplish 
using a combination of syntax and lexical semantics. In 
this paper, we outline a proposal to capture these dis- 
tinctions based on separate ontologies for each individ- 
ual language. Our method is applicable to both trans- 
fer and interlingua based approaches, and provides a 
more elegant solution than exhaustive enumeration and 
a more local solution then reliance on 'world knowledge' 
modules. This method has been partially implemented 
in FB-LTAGs [4, 12, 16], whose feature-based, lexical- 
ized approach provides an advantageous environment 
for modelling the more specific and language depen- 
dent syntactic and semantic distinctions necessary to 
further filter the choice of the lexical item. 



2 Defining the Problem 

The essence of the problem that we are trying to solve 
involves lexical constraints that are critical for one lan- 
guage but non-existant or completely different in an- 
other. A classic example of this is the Japanese wear 
example. 

(1) kare wa boushi wo kahuru. 
he hat wear 
He wears a hat. 

(2) kare wa kutsushtta wo haku. 
he socks wear 



He wears a pair of socks. 

Sentences (1) and (2) highlight a situation in which 
one language (Japanese) distinguishes several senses of 
a concept /WEAR/ that has only one sense in an- 
other language (English). In Japanese, kahuru selects 
for items worn on the head, such as hats, while haku 
selects for items such as socks. English wear does not 
make this lexical distinction. 

A similar problem occurs when translating English 
break into Chinese. The semantic features used in se- 
lecting the correct verb construction in Chinese (such 
as the initial shape of the object, choice of instrument) 
are not all used in selecting English verb senses. This 
causes difficulties for a large-scale transfer based system 
such as TRANSTAR, a commercial broad coverage En- 
glish/Chinese MT system developed in Beijing. When 
this system is applied to sentences from the Brown cor- 
pus that contain break, an accuracy rate of less than 
20% is achieved, even after ruling out idiomatic uses 
and problems with parsing [17]. The primary reason 
is that in English break can be thought of as a very 
general verb indicating an entire set of breaking events 
which can be distinguished by the resulting state of 
the object being broken. Shatter, snap, split, etc. can 
all be seen as more specialized versions of the general 
breaking event. Chinese has no equivalent verb for in- 
dicating this class of breaking events, and each usage of 
break has to be mapped onto a more specialized lexical 
item. Even the English specializations of a breaking 
event do not cover all of the different ways in which 
Chinese can semantically distinguish between breaking 
events. The end result is that lexical selection from 
English to Chinese is often predicated on the existence 
of semantic features that are completely irrelevant to 
English. 

This is not a problem that is unique to English and 
Chinese or Japanese. In looking for cross-linguistic se- 
mantic universals for break and other semantically sim- 
ilar verbs, Pye found that there were as many different 
semantic classification schemes as there were languages 
being investigated [11]. The solution to this problem 
is elusive enough when considering two particular lan- 
guages. It must be recognized that a typical transfer- 
based approach requires a direct mapping from each 
distinct verb sense to its corresponding lexical item in 
the target language, and must therefore specify all of 
the semantic features relevant to both languages. The 
interlingua approach has a similar difficulty, since it 
must define an interlingua that can take into account 
all of the semantic features for both languages. When 
one begins to consider the problem from the perspec- 
tive of several languages, this technique quickly be- 
come impractical. The direct mapping approach be- 
comes cumbersome, unwieldy, and extremely tedious to 
build, since it means reanalyzing the semantic features 
of each language according to every language that it is 
being paired with. For the interlingua, a vast, language 
universal ontology must be built that incorporates ev- 
ery semantic feature for every language in an organized 
fashion. That means that not only do correspondences 



have to be found between individual lexical items, but 
also between the classification schemes by which each 
language structures its concepts. While there has been 
a lot of promising recent work on the problem of verb 
classifcation, but it is not clear that it supports the no- 
tion of a readily accessible language universal ontology. 
For instance. Levin [6] has shown that there is a cor- 
respondence between lexical-semantic verb classes and 
syntactic structure for English and there has been spec- 
ulation that these verb classes should extend to other 
languages since they are based on cross-linguistic se- 
mantic concepts. Mitamura, however, has determined 
a classification for Japanese verbs that shows very lit- 
tle correspondence to Levin's classes [7]. The EDR 
project, an enormous effort (over 200,000 words) to 
build an English-Japanese bilingual dictionary based 
on a joint conceptual classification has found a concep- 
tual overlap between the two languages of only about 
10% [18]. Another large ongoing effort in France has 
also been looking at generalizations about verb classes 
in French that can be made based on allowable syn- 
tactic transformations. This work is currently being 
extended to several other languages, but each language 
is being done independently, from the ground up, with 
very little sharing of classification schemes [5]. None 
of this rules out the possibility of semantic universals, 
or large areas of conceptual overlap between languages, 
but it does highlight the extreme individuality of each 
language, and the overwhelming task that lies in front 
of anyone trying to merge language-specific conceptu- 
alizations. 



3 Proposed Model 

We believe that the most practical approach is to as- 
sume that each language will require its own concep- 
tual ontology with a distinct set of semantic features. 
Many of the concepts in the lexical semantic ontologies 
may be shared among languages, but languages may 
choose to structure the concepts differently. With this 
in mind, we suggest an approach to translation that 
does not always attempt to directly map a specific verb 
sense in the source language to another specific sense 
in the target language. Rather, it begins with a more 
coarse-grained lexical translation process, which merely 
attempts to focus on a particular set of translation can- 
didates in the source language. These candidates will 
be further narrowed down by a language specific lexical 
selection process which examines the semantic features 
associated with the instantiated verb arguments and 
determines the best fit. Therefore, in many cases, the 
detailed merging of language specific semantic features 
associated with the source sense to the target sense can 
simply be avoided. Rather than one-to-one mappings 
between lexical items, the dictionary would map be- 
tween sets of lexical items^. For the transfer approach. 



This is similar in motivation to the interhngua approach, 
where the goal is to capture semantic similarities by associat- 
ing several lexical items with the same primitive concept. In 
the same way, we are grouping semantically similar lexical items 



one consequence of making the semantic structures lo- 
cal is a much broader concept of the bilingual dictio- 
nary. For instance, English break maps to a set of Chi- 
nese verbs such as da sut (break into many pieces), da 
puneig (break continuity), da po (break into irregularly 
shaped pieces). Correspondingly, da sut would map 
to a set of English verbs such as break, shatter, crum- 
ble. The final selection of the actual lexical item will 
be made in the target language based on the seman- 
tic features associated with the prospective arguments. 
This type of of approach can be used to match a lex- 
ical item to another lexical item, or it could also be 
used to match a lexical item to an 'deep semantic' rep- 
resentation, such as an interlingua. As such, it could 
be utilized in either a transfer or interlingua based sys- 
tem. One of the advantages of this approach is that 
the same self-contained, language-specific representa- 
tion that is normally used for any form of analysis or 
generation becomes very applicable to machine trans- 
lation [9]. More importantly, it is not necessary that 
the languages being translated have the same underly- 
ing verb classes, since the semantic structure is local to 
each language. However, we cannot entirely avoid the 
issue of finding the conceptual links between language- 
specific classification schemes. We are still left with 
the problem, given the different classification schemes, 
of associating appropriate classes of lexical items in a 
target language with the most closely corresponding 
class in the source language. Since we have just argued 
that there will never be exactly corresponding classes in 
any two languages, this is clearly still a difficult issue. 
However, we do not have to try to force the different 
classification schemes into a single interlingua. It might 
be that the most useful method for taking advantage 
of our approach would be in a hybrid system that uses 
a direct transfer method in certain circumstances, and 
a more general, classification-correspondence approach 
in other circumstances. 

4 Implementation 

We have begun to implement this model in a variant of 
the Synchronous TAGs formalism, a Lexicalized TAG 
suitable for machine translation [13, 1], which has been 
augmented to handle feature-based unification. This 
particular formalism has a number of advantages for 
our approach. First, it is lexicalized, which makes it 
easier to specify the lexically specific semantic infor- 
mation in a syntactic context. This is important in 
languages such as English where the semantics can have 
syntactic consequences [6]. Second, it is feature-based, 
which provides a convenient notation and mechanism in 
which to specify the selectional restrictions. Third, the 
extended domain of locality provided by the tree struc- 
tures allows lexical items to easily place constraints on 
other lexical items in the same frame. The disadvan- 



tages of FB-LTAGs include difficulty in specifying com- 
plex feature hierarchies, and a unification system that 
would not be able to take advantage of class inclusion. 

Although Synchronous TAGs can also be used with 
an interlingua [14] we chose to start with a transfer- 
based approach between the languages Chinese and En- 
glish. We will work through an simple example to show 
how Synchronous FB-LTAGs handle this method. Se- 
mantic constraints are specified in the usual method for 
each language. The semantic characteristics of a lexical 
item (or each sense of a lexical item) are instantiated 
as features in the syntactic lexicon. Lexical items may 
also specify constraints on semantic features of other 
lexical items available in its syntactic frame (i.e. local 
to its tree). At parse time, of course, the features and 
feature constaints must unify. This is done indepen- 
dantly for each language. 

Our syntactic lexicon lists, among others, 5 Chinese 
expressions for the concept break: 

• da sut (hit into many pieces) 

• da duan (break into line segments) 

• da po (hit irregular) 

• da hut (make nonfunctional) 

• da punetg (discontinue a journey or song) 

The lexical item for each Chinese verb specifies in 
its features what semantic restrictions it places on its 
object^. Each noun also specifies its semantic cate- 
gories, at the granularity that is necessary for this par- 
ticular language. For instance, the Chinese verb da sut 
takes an object that is a physical object and is brittle, 
while the verb da punetg takes a continuous abstract 
object, as illustrated in Figure 1. The noun huaptn 
(vase) is, among other things, a physical, irregularly 
shaped, brittle object, and the noun lucheng (journey) 
is a continuous, abstract object. The corresponding 
noun phrase trees are shown in Figure 2. 

When translating the English sentence John broke 
the vase, the lexical transfer for break maps from the 
semantic concept /BREAK/ in English to the seman- 
tic concept /BREAK/ in Chinese. This includes all 
5 Chinese expressions listed above. Thus a number 
of Chinese translations are initially generated, but the 
semantic feature constraints imposed by most of the 
verbs will cause the sentence to fail. For instance, da 
punetg would not be able to unify with the noun huaptn. 
The sentence John broke the vase would then necessar- 
ily be translated as Jt-Yong da sut huaptn, while John 
broke the journey would be translated as Jt-Yong da 
punetg lucheng. We were able to correctly translate 
the the English sentence without specifying semantic 



together, but we are retaining the complete semantic representa- 
tions with selection restrictions for the individual lexical items. 
The group "class" or "concept" is not a substitute for the indi- 
vidual semantic representation, but an enhancement. 



■^The same is true on the English side where we differentiate 
between different senses of the lexical item break, which are dis- 
tinguished by the object of the clause, i.e. functional break vs 
physical break [10]. These senses have different syntactic behav- 
iors in English. Critically, though, the distinctions necessary for 
English are not forced onto the Chinese breaking verbs (or vice 
versa). 





NP„i VP NP,i [sem : <1> [J NP„i VP NP,i [sem : <1> [J 



VPL 




sem : realm : physical 
form : brittle 



V, V„ |sem : <1>J 

sem : realm : mental 

[shape : continuous] 



sui da puneig 

da sut tree da puneig tree 

Figure 1: Two trees corresponding to English break 

NP 



N sem : <1>| 
sem : 



realm : physical 
shape : irregular 
form : brittle 
type : physical 



huapin 



NP 



sem : <1>J 

sem : realm : mental 

[shape : continuous] 



lucheng 

lucheng tree 



huapin tree 
Figure 2: NP trees for Chinese vase and journey 



information in the English syntactic lexicon that was 
critical to the correct lexical selection in Chinese. 



5 Future Work and Conclusion 

This work is initial work on a problem of IVIachine 
Translation that has often been ignored or relegated 
to 'pragmatics' or 'world knowledge'. As such, there 
remains much more work to done, from extending our 
implementation described here to include a larger set of 
lexical items, to working on semantic ontologies for the 
languages that we are interested in, to questions such 
as how much and what kind of information is really 
language specific. Unless we are claiming that no fea- 
tures need to be shared between language translation 
pairs, which we are not, a decision must still be made 
about what information, should be transferred between 
the languages. A related question arises for interlingua 
approaches - what information should be included in 
the underlying semantic representation. It is not at all 
clear to us where that line should be drawn. 

References 

[1] Anne Abeille, Yves Schabes, and Aravind K. Joshi. 



Using lexicalized tags for machine translation. In Pro- 
ceedings of the International Conference on Compu- 
tational Linguistics (CJOLING '90), Helsinki, Finland, 
1990. 
[2] J.C. Carbonell, R.E. CuUingford, and A.V. Gersh- 
nian. Steps toward knowledge-based machine trans- 
lation. IEEE Transactions on Pattern Analysis and 
Machine Intelligence, 3:376-392, 1981. 
[3] Bonnie Jean Dorr. Machine Translation: A View from 

the Lexicon. MIT Press, Cambridge, Mass, 1993. 
[4] Aravind K. Joshi, L. Levy, and M. Takahashi. Tree 
adjunct grammars. lournal of Computer and System 
Sciences, 1975. 
[5] Christian Leclerc. Organisation du lexique-grammaire 
des verbes francais. In Langue Erangais: Dictionnaires 
Electroniques du Erangais. Larousse, September 1990. 
[6] Beth Levin. English Verb Classes and Alternation, A 
Preliminary Investigation. The University of Chicago 
Press, 1993. 
[7] T. Mitamura. The Hierarchical Organization for Pred- 
icate Erames for Interpretive Mapping in Natural Lan- 
guage Processing. PhD thesis. University of Pitts- 
burgh, Pittsburgh, Pennsylvania, USA, 1989. 
[8] S. Nirenburg, J. Carbonell, M. Tomita, and K. Good- 
man. Machine translation: a knowledge-based ap- 
proach. Morgan Kaufmann, San Mateo, California, 
USA, 1992. 
[9] Martha Palmer, Rebecca Passonneau, Carl Weir, and 
Tom Finin. The KERNEL text understanding system. 
Artificial Lntelligence, 63:17-68, 1993. 

[10] Martha Palmer and Alain Polguere. A lexical and con- 
ceptual analysis of BREAK. In Lexical Computational 
Semantics. Cambridge University Press, 1994. 

[11] Clifton Pye. Breaking concepts: Constraining pred- 
icate argument structure. Presented at the Kansas 
Linguistics Workshop, Lawrence, Kansas, USA, 1993. 

[12] Yves Schabes. Mathematical and computational as- 
pects of lexicalized grammars. Ph.D. thesis MS-CIS- 
90-48, LING LAB179, Computer Science Department, 
University of Pennsylvania, Philadelphia, PA, 1990. 

[13] Stuart Shieber and Yves Schabes. Synchronous Tree 
Adjoining Grammars. In Proceedings of the 13* Ln- 
ternational Conference on Computational Linguistics 
(COLLNG'90), Helsinki, Finland, 1990. 

[14] Jiping Sun. Interlingua-based MT through syn- 
chronous TAG. In International Workshop on NLU 
and AL, Fukuoka, Japan, 1992. 

[15] Bernard Vauquois and Christian Boitet. Automated 
translation at Grenoble University. Computational 
Linguistics, 11, Number 1, 1985. 

[16] K. Vijay-Shanker and Aravind K. Joshi. Unification 
based Tree Adjoining Grammars. In J. Wedekind, ed- 
itor. Unification-based Grammars. MIT Press, Cam- 
bridge, MA, 1991. 

[17] Zhibiao Wu and Martha Palmer. Verb semantics and 
lexical selection. In 32nd Meeting of the Association for 
Computational Linguistics, Las Cruces, New Mexico, 
1994. 

[18] Hiroshi Yasuhara. Conceptual transfer in an interlin- 
gua method and example based MT. In Proceedings 
of the Natural Language Pacific Rim Symposium (NL- 
PRS '93), Fukuoka, Japan, December 1993. 



